巴西专利BR112012017551B1 APPARATUS AND METHOD TO EXTRACT A DIRECT / ENVIRONMENT SIGN FROM A DOWNMIX SIGN AND SPACE PARAMETRIC

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
apparatus and method for extracting a direct / ambient signal from a downmix signal and spatial parametric information an apparatus is described for extracting a direct and / or ambient signal from a downmix signal and spatial parametric information, the downmix signal and spatial parametric information representing a multi-channel audio signal having more channels than the downmix signal, where the spatial parametric information comprises inter-channel relationships of the multi-channel audio signal. the device comprises a direct / ambient estimator and a direct / ambient extractor. the direct / ambient estimator is configured to estimate level information from a direct part and / or an ambient part of the multi-channel audio signal based on spatial parametric information. the direct / ambient extractor is configured to extract a direct signal portion and / or an ambient signal portion from the downmix signal based on the estimated level information of the direct or ambient portion.
公开号:BR112012017551B1
申请号:R112012017551-3
申请日:2011-01-11
公开日:2020-12-15
发明作者:Juha Vilkamo；Jan PLOGSTIES；Bernhard NEUGEBAUER；Jürgen Herre
申请人:Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V；
IPC主号:

专利说明:

DESCRIPTION
The present invention relates to audio signal processing and, in particular, to an apparatus and method for extracting a direct / ambient signal from a downmix signal and spatial parametric information. Additional embodiments of the present invention relate to the use of direct / ambient separation to enhance the binaural reproduction of audio signals. In addition, the additional achievements refer to the binaural reproduction of multi-channel sound, where multi-channel audio means audio having two or more channels. Typical audio content having multi-channel sound is film soundtracks and multi-channel music recordings.
The special human hearing system tends to roughly process sound in two parts. There is, on the one hand, a localizable or direct part and, on the other hand, a non-localizable or ambient part. There are many audio processing applications, such as binaural sound reproduction and multi-channel upmixing, where it is desirable to have access to these two audio components.
In the art, direct / environment separation methods, as described in "Primary-ambience signal decomposition and vector-based localization for spatial audio coding and enhancement", Goodwin, Jot, IEEE Inti.Conf. On Acoustics, Speech and Signal proc, April 2007; "Correlation-based ambience extraction from stereo recordings", Merimaa, Goodwin, Jot, AES 123rd Convention, New York, 2007; "Multiple-loudspeaker playback of stereo signals", C. Faller, Journal of the AES, Oct. 2007; "Primary-ambient decomposition of stereo audio signals using a complex similarity index"; Goodwin et al., Pub. No: US2009 / 0198356 Al, Aug 2009; "Patent application title: Method to Generate Multi-Channel Audio Signal from Stereo Signals", Inventors: Christof Faller, Agents: FISH & RICHARDSON PC, Assignees: LG ELECTRONICS, INC., Origin: MINNEAPOLIS, MN US, IPC8 Class: AH04R500FI, USPC Class: 381 1; and "Ambience generation for stereo signals", Avendano et al., Date Issued: July 28, 2009, Application: 10 / 163,158, Filed: June 4, 2002 are known, which can be used for different applications. The prior direct-ambient separation algorithms of the prior art are based on the comparison of inter-channel stereo sound signals in frequency bands.
Furthermore, in „Binaural 3-D Audio Rendering Based on Spatial Audio Scene Coding", Goodwin, Jot, AES 123rd Convention, New York 2007, binaural reproduction with ambient extraction is addressed. Ambient extraction in connection with binaural reproduction is also mentioned in J. Usher and J. Benesty, "Enhancement of spatial sound quality: a new reverberationextraction audio upmixer," IEEE Trans. Audio, Speech, Language Processing, vol. 15, pp. 2141-2150, Sept. 2007. The latter document focuses on ambient extraction in stereo microphone recordings, using adaptive least squares average cross-channel filtering of the direct component on each channel. Spatial audio codecs, for example, MPEG surround, typically consist of one or two audio streams channel in combination with spatial parallel information, which extends audio across multiple channels, as described in ISO / IEC 23003-1 - MPEG Surround; and Breebaart, J., Herre, J., Villemoes, L., Jin, C ., Kjõrling, K., Plogsties, J., Koppens, J. (2006). "Multi-channel 5 goes mobile: MPEG Surround binaural rendering". Proc. 29th AES conference, Seoul, Korea.
However, modern parametric audio coding technologies, such as MPEG-surround (MPS) and parametric stereo (PS) provide only a small number of 10 audio downmix channels - in some cases, only one - along with additional spatial parallel information. The comparison between the "original" input channels is then only possible after first decoding the sound in the desired output format.
Therefore, a concept to extract a part of a direct signal or a part of ambient signal from a downmix signal and spatial parametric information is necessary. However, there are no existing solutions for direct extraction / environment using the parametric parallel information.
Therefore, it is an objective of the present invention 20 to provide a concept for extracting a part of direct signal or a part of ambient signal from a downmix signal by the use of spatial parametric information.
This objective is achieved by an apparatus, according to claim 1, a method, according to claim 15, or a computer program, according to claim 16.
The basic idea underlying the present invention is that the direct extraction / environment mentioned above can be achieved when level information of a direct part or an ambient part of a multi-channel audio signal is estimated based on the spatial and spatial parametric information. a direct signal part or an ambient signal part is extracted from a downmix signal based on the estimated level information. Here, the downmix signal and spatial parametric information represent the multi-channel audio signal having more channels than the downmix signal. This measure allows a direct and / or ambient extraction of a downmix signal having one or more input channels when using 10 spatial parametric parallel information.
According to an embodiment of the present invention, an apparatus for extracting a direct / ambient signal from a downmix signal and spatial parametric information comprises a direct / ambient estimator and a direct / ambient extractor. The downmix signal 15 and the spatial parametric information represent a multi-channel audio signal having more channels than the downmix signal. In addition, spatial parametric information comprises inter-channel relationships of the multi-channel audio signal. The direct / ambient estimator is configured to estimate level information from a direct part or an ambient part of the multi-channel audio signal based on spatial parametric information. The direct / ambient extractor is configured to extract a direct signal portion or an ambient signal portion from the downmix signal based on the estimated level information 25 of the direct or ambient portion.
According to another embodiment of the present invention, the apparatus for extracting a direct / ambient signal from a downmix signal and spatial parametric information further comprises a binaural direct sound interpretation device, a binaural ambient sound interpretation device and a combiner. The binaural direct sound interpretation device is configured to process the direct signal portion to obtain a first binaural output signal. The binaural ambient sound interpretation device is configured to process the ambient signal portion to obtain a second binaural output signal. The combiner is configured to combine the first and second binaural output signals to obtain a combined binaural output signal. Therefore, a binaural reproduction of an audio signal, in which the direct signal portion and the ambient signal portion of the audio signal are processed separately, can be provided.
In the following, the realizations of the present invention are explained with reference to the accompanying drawings in which:
Figure 1 presents a block diagram of an I realization, apparatus to extract a direct / ambient signal from a downmix signal and spatial parametric information that represent a multi-channel audio signal;
Figure 2 shows a block diagram of an embodiment of an apparatus for extracting a direct / ambient signal from a mono downmix signal and spatial parametric information that represents a parametric stereo audio signal;
Figure 3a shows a schematic illustration of the spectral decomposition of a multi-channel audio signal, according to an embodiment of the present invention;
Figure 3b shows a schematic illustration for calculating inter-channel relationships of a multi-channel audio signal based on the spectral decomposition of Figure 3a;
Figure 4 presents a block diagram of a realization of a direct extractor / environment with downmixing of estimated level information;
Figure 5 presents a block diagram of an additional realization of a direct extractor / environment when applying gain parameters to a downmix signal;
Figure 6 presents a block diagram of an additional realization of a direct extractor / environment based on the LMS solution with cross channel mixing;
Figure 7a presents a block diagram of a direct / environment estimator realization using a stereo environment estimation formula;
Figure 7b presents a graph of a ratio of direct energy to total versus exemplary inter-channel coherence;
Figure 8 shows a block diagram of an encoder / decoder system, according to an embodiment of the present invention;
Figure 9a presents a block diagram of an overview of direct binaural sound interpretation, according to an embodiment of the present invention;
Figure 9b presents a block diagram of details of the binaural direct sound interpretation of Figure 9a;
Figure 10a shows a block diagram of an overview of binaural ambient sound interpretation, according to an embodiment of the present invention;
Figure 10b shows a block diagram of details of the binaural ambient sound interpretation of details of the binaural ambient sound interpretation of Figure 10a;
Figure 11 shows a conceptual block diagram of a binaural reproduction of a multi-channel audio signal;
Figure 12 presents a general block diagram of a direct extraction / environment realization including binaural reproduction;
Figure 13a shows a block diagram of an embodiment of an apparatus for extracting a direct / ambient signal from a mono downmix signal in a filter bank domain;
Figure 13b shows a block diagram of an embodiment of a direct extraction / environment block of Figure 13a; and
Figure 14 shows a schematic illustration of an exemplary MPEG Surround decoding scheme, in accordance with a further embodiment of the present invention.
Figure 1 shows a block diagram of an embodiment of an apparatus 100 for extracting a direct / ambient signal 125-1, 125-2 from a downmix signal 115 and spatial parametric information 105. As shown in Figure 1, the signal downmix 115 and the spatial parametric information 105 represents a multi-channel audio signal 101 having more Chi ... ChN channels than the downmix signal 115. The spatial parametric information 105 may comprise inter-channel relationships of the multi-channel audio signal 101. In particular, apparatus 100 comprises a direct / ambient estimator 110 and a direct / ambient extractor 120. Direct / ambient estimator 110 can be configured to estimate level 113 information from a direct portion or an ambient portion of the multi-channel audio signal 101 with based on spatial parametric information 105. The direct / ambient extractor 120 can be configured to extract a direct signal part 125-1 or an ambient signal part 125-2 from the downmix signal 115 based on the estimated level information 113 of the direct part or the ambient part.
Figure 2 shows a block diagram of an embodiment of an apparatus 200 for extracting a direct / ambient signal 125-1, 125-2 from a mono downmix signal 215 and spatial parametric information 105 representing a parametric stereo audio signal 201. The apparatus 200 of Figure 2 comprises essentially the same blocks as the apparatus 100 of Figure 1. Therefore, identical blocks having similar implementations and / or functions are denoted by the same numbers. In addition, the parametric stereo audio signal 201 of Figure 2 can correspond to the multi-channel audio signal 101 of Figure 1, and the mono downmix signal 215 of Figure 2 can correspond to the downmix signal 115 of Figure 1. In carrying out Figure 2 , the mono downmix signal 215 and the spatial parametric information 105 represent the parametric stereo audio signal 201. The parametric stereo audio signal may comprise a left channel indicated by 'L' and a right channel indicated by 'R'. Here, the direct / ambient extractor 120 is configured to extract the direct signal part 125-1 or the ambient signal part 125-2 from the mono downmix signal 215 based on the estimated level information 113, which can be derived from the parametric information spatial data 105 by using the direct / environment estimator 110.
In practice, the spatial parameters (spatial parametric information 105) in the realization of Figure 1 or Figure 2, respectively, refer especially to parallel MPEG surround (MPS) or parametric stereo (PS) information. These two technologies are methods of encoding surround audio or low bit rate stereo in the prior art. With reference to Figure 2, PS provides a downmix audio channel with spatial parameters, and with reference to Figure 1, MPS provides one, two or more downmix audio channels with spatial parameters.
Specifically, the realizations of Figure 1 and Figure 2 clearly show that the spatial parametric parallel information 105 can be readily used in the direct extraction and / or ambient field of a signal (i.e., downmix signal 115; 215) that has one or more audio channels.
The estimation of direct levels and / or environment (level 113 information) is based on information about inter-channel relationships or inter-channel differences, such as differences and / or level correlation. These values can be calculated from a stereo signal or from multiple channels. Figure 3a presents a schematic illustration of spectral decomposition 300 of a multi-channel audio signal (Ch1 ... ChN) to be used to calculate inter-channel relations of the respective Chi ... ChN. As can be seen in Figure 3a, a spectral decomposition of an inspected channel Chi of the multi-channel audio signal (Chi ... ChN) or a linear combination R of the rest of the channels, respectively, comprises a plurality of 301 sub-bands, in that each sub-range 303 of the plurality 301 of sub-ranges extends along a horizontal axis (time axis 310) having sub-range values 305, as indicated by small boxes of a time / frequency grid. In addition, sub-bands 303 are located consecutively along a vertical axis (frequency axis 320) corresponding to different frequency regions of a filter bank. In Figure 3a, a respective time / frequency cut X "'k or Xβk is indicated by a dashed line. Here, index i denotes the Chi and R channel the linear combination of the rest of the channels, while the nek indices correspond to certain intervals time of filter bank 307 and sub-bands of filter bank 303. Based on these time cuts / frequency X ”'k and X ^' k, for example, which are located at the same time / frequency point (t0, fo) with respect to the time / frequency axes 310, 320, inter-channel relations 335, such as inter-channel coherence (ICCi) or channel level differences (CLDj.) of the inspected channel Chi, can be calculated in a step 330, as shown in Figure 3b. the calculation of the ICCi and CLDi interchannel relationships can be performed using the following relationships:
where Chié the inspected channel and R the linear combination of remaining channels, while <...> denotes an average of time. An example of a linear R combination of channels remaining is their normalized sum of energy. In addition, the channel level difference (CLDi) is typically a decibel value of the <Jj parameter.
With reference to the above equations, the difference in channel level (CLDi) or parameter <yj can correspond to a level Pi of channel Chi normalized to a level PR of the linear combination R of the rest of the channels. Here, the Pi or PR levels can be derived from the ICLDI inter-channel level difference parameter of the Chi channel and a linear ICLDR combination of the ICLDj inter-channel level difference parameters from the rest of the channels.
Here, ICLDÍ and ICLDj can be related to a reference channel Chref, respectively. In additional realizations, the inter-channel level difference parameters ICLDÍ and ICLDj can also be related to any other channel of the multi-channel audio signal (Chi ... ChN) being the reference channel Chref. This will eventually lead to the same result for the channel level difference (CLDi) or cr parameter.
According to the additional realizations, the inter-channel relations 335 of Figure 3b can also be derived when operating on different or all pairs of Ch ±, Chj of input channels of the multi-channel audio signal (Chi ... ChN). In this case, the ICCifj inter-channel coherence parameters calculated in pair or channel level difference (CLDifj) or cr, - (or ICLDÍ, j) parameters can be obtained, the indices (i, j) denoting a given Chi channel pair and Chj, respectively.
Figure 4 presents a block diagram of a realization 400 of a direct extractor / environment 420, which includes downmixing the estimated level information 113. The realization of Figure 4 comprises essentially the same blocks as the realization of Figure 1. Therefore, identical blocks having similar implementations and / or functions are denoted by the same numbers. However, the direct extractor / room 420 of Figure 4, may correspond to the direct extractor / room 120 of Figure 1, is configured to dowmix the estimated level information 113 of the direct or ambient part of the multi-channel audio signal to obtain downmixed level information from the direct or ambient part and extracting the direct signal part 125-1 or the ambient signal part 125-2 from the downmix signal 115 based on the downmixed level information. As shown in Figure 4, the spatial parametric information 105 can, for example, be derived from the multi-channel audio signal 101 (Chi - ChN) of Figure 1 and can comprise the Chi - ChN inter - channel relationships introduced in Figure 3b. The spatial parametric information 105 in Figure 4 can also comprise dowmixing information 410 to be fed to the direct extractor / environment 420. In realizations, dowmixing information 410 can characterize the downmix of an original multi-channel audio signal (for example, the multi-channel audio signal 101 of Figure 1) in the downmix signal 115. Dowmixing can, for example, be performed using a downmixer (not shown) that operates in any coding domain, such as a time domain or a spectral domain.
According to the additional realizations, the direct extractor / environment 420 can also be configured to downmix the estimated level information 113 of the direct or ambient part of the multi-channel audio signal 101 by combining the estimated level information of the direct part with coherent sum and the estimated level information of the ambient part with incoherent sum.
It is emphasized that the estimated level information can represent energy levels or power levels of the direct part or the environment part, respectively.
In particular, the dowmixing of energies (ie level 113 information) from the estimated direct / environment part can be performed by assuming complete incoherence or complete coherence between channels. The two formulas that can be applied in the case of dowmixing based on the incoherent or coherent sum, respectively, are as follows.
For inconsistent signals, downmixed energy or downmixed level information can be calculated by

For coherent signals, downmixed energy or downmixed level information can be calculated by

Here, g is the downmix gain, which can be obtained from the dowmixing information, while E (Chi) denotes the energy of the direct / ambient part of a Chi channel of the multi-channel audio signal. As a typical example of incoherent dowmixing, in the case of 5.1 channel dowmixing in two, the energy of the downmix left can be:

Figure 5 shows an additional realization of a direct extractor / environment 520 when applying gD, gA gain parameters to a downmix signal 115. The direct extractor / environment 520 of Figure 5 can correspond to the direct extractor / environment 420 of Figure 4. First , estimated level information from a direct 545-1 part or an environment part 545-2 can be received from a direct / environment estimator as described above. The level information received 545-1, 545-2 can be combined / downmixed in a step 550 to obtain level information downmixed from the direct part 555-1 or the ambient part 555-2, respectively. Then, in a step 560, gD 565-1 or gA 565-2 gain parameters can be derived from the downmixed level information 555-1, 555-2 for the direct or ambient part, respectively. Finally, the direct / ambient extractor 520 can be used to apply the derived gain parameters 565-1, 565-2 to the downmix signal 115 (step 570), so that the direct signal part 125-1 or the ambient signal part 125-2 is obtained.
Here, it should be noted that in the realizations of Figures 1; 4; 5, the downmix signal 115 may consist of a plurality of downmix channels (Chi „.ChM) present at the inputs of the direct extractors / environments 120; 420; 520, respectively.
In additional realizations, the direct extractor / environment 520 is configured to determine a direct energy to total (DTT) or environment to total (ATT) ratio of downmixed level information 555-1, 555-2 from the direct part or the ambient part and use as gain parameters 565-1, 565-2 extraction parameters based on the determined DTT or ATT energy proportion.
Yet, but with additional realizations, the direct extractor / environment 520 is configured to multiply the downmix signal 115 as a first square root of the extraction parameter (DTT) to obtain the direct signal part 125-1 and with a second square root of the parameter extraction (ATT) to obtain the ambient signal part 125-2. Here, the downmix signal 115 can correspond to the mono downmix signal 215 as shown in the realization of Figure 2 ('mono downmix case').
In the case of mono downmix, ambient extraction can be done by applying square root (ATT) and square root (DTT). However, the same approach is also valid for multi-channel downmix signals, in particular, when applying the square root (ATTjJ and the square root (DTTj.) For each Chi channel.
According to the additional realizations, in case the downmix signal 115 comprises a plurality of channels ('multi-channel downmix case'), the direct extractor / environment 520 can be configured to apply a first plurality of extraction parameters, for example example square root (DTTi), to the downmix signal 115 to obtain the direct signal part 125-1 and a second plurality of extraction parameters, for example, the square root (ATTi), to the downmix signal 115 to obtain the signal part environment 125-2. Here, the first and the second plurality of extraction parameters can form a diagonal matrix.
In general, the direct extractor / environment 120; 420; 520 can also be configured to extract the direct signal part 125-1 or the ambient signal part 125-2 by applying the quadratic M by M extraction matrix to the downmix signal 115, where one size (M) of the extraction matrix M by M quadratic corresponds to several (M) downmix channels (Chi „.ChM).
The application of ambient extraction can therefore be described by applying an extraction matrix M by M quadratic, where M is the number of downmix channels (Chi ... ChM). This can include all possible ways of manipulating the input signal to obtain the direct output / environment, including the relatively simple approach based on the square root (ATTi) and square root (DTTJ) parameters that represent major elements of a matrix of quadratic M by M extraction being configured as a diagonal matrix, or an LMS cross-mixing approach as a complete matrix. The latter will be described below. Here, it should be noted that the above approach of applying the M by M equation matrix covers any number of channels, including one.
According to the additional realizations, the extraction matrix may not necessarily be a quadratic matrix of matrix size M by M, as we could have a smaller number of output channels. Therefore, the extraction matrix may have a reduced number of lines. An example of this would be the extraction of a single direct signal instead of M.
It is also not always necessary to consider all M downmix channels as the corresponding entry when having M columns from the extraction matrix. This, in particular, could be relevant to applications where it is not necessary to have all channels as inputs.
Figure 6 presents the block diagram of an additional realization 600 of a direct extractor / environment 620 based on the LMS solution (mean of least squares) with cross channel mixing. The direct extractor / environment 620 of Figure 6 can correspond to the direct extractor / environment 120 of Figure 1. In the realization of Figure 6, identical blocks having similar implementations and / or functions as in the realization of Figure 1 are, therefore, denoted by the same numbers . However, the downmix signal 615 of Figure 6, which may correspond to the downmix signal 115 of Figure 1, may comprise a plurality 617 of downmix channels Chi „.ChM, where the number of downmix channels (M) is less than that of the channels Ch1 ... ChN (N) of the multi-channel audio signal 101, that is, M <N. Specifically, the direct / ambient extractor 620 is configured 5 to extract the direct signal part 125-1 or the environmental signal part 125-2 by a least squares average (LMS) solution with cross channel mixing, the LMS solution does not need equal ambient levels. This LMS solution that does not need equal environment levels and that is also extendable to any number of channels is provided below. The LMS solution mentioned now is not mandatory, but it represents a more precise alternative to the one above.
The symbols used in the LMS solution for cross-mixing weights for direct extraction / environment are: Chi channel i ai direct sound gain in channel i D and D direct part of the sound and its estimate Ae Ai ambient part of channel I and its px estimate = £ [xX] estimated energy of X £ [] expectation EX estimate error of X w ^. LMS cross-mix weights for channel i to the direct part w-LMS cross-mix weights for channel n to the environment of channel i
In this context, it should be noted that the derivation of the LMS solution can be based on a spectral representation of the respective channels of the multi-channel audio signal, which means all the functions in the frequency bands.
The signal mode given by Chj = atD + Aj
The derivation of the first separations with a) the direct part and then b) with the ambient part. Finally, the weighting solution is derived and the method for normalizing the weightings is described. A) DIRECT PART
The estimation of the direct part of the weights is

The error estimate, read

To have the LMS solution, we need E orthogonal to the input signals

As a matrix, the above relationship reads Aw = P
B) ENVIRONMENTAL PART
We start from the same signal model and estimate the weightings of
The estimation error is
and the orthogonality in the matrix form the relationship above le
weighting solution
The weightings can be solved by inverting matrix A, which is identical both in the calculation of the direct patre and in the environment. In the case of stereo signals the solution is
where divides the divider a2a2PDPAX + a {axPDPA2 + PAXPA2 STANDARDIZATION OF WEIGHTINGS
The weights are for the LMS solution, but as the energy levels must be preserved, the weights are normalized. This also makes division by the term div unnecessary in the formulas above. Normalization happens by ensuring that the energies of the direct channels and the output environment are PD and PAi, where i is the index channel.
This is simple, assuming we know the inter-channel coherences, mixing factors and channel energies. For simplicity, we focus on the case of two channels and especially a weighting pair and w- which are the gains to produce
The first ambient channel of the first and second audio channels. The steps are as follows: Step 1: Calculate the output signal energy (where the coherent part adds in the form of amplitude, and incoherent part in the form of energy)
Step 2: calculate the normalization gain factor
and apply the result to the cross mixing weighting factors w- and Wj12. In step 1, absolute values and signal operators for ICC are included to also consider the case where the audio channels are negatively coherent. Incoherent weighting factors are also normalized in the same way.
In particular, with reference to the aforementioned, the direct extractor / environment 620 can be configured to derive the LMS solution by assuming a stable multichannel signal model, so that the LMS solution will not be restricted to a downmix signal of stereo channel.
Figure 7a presents a block diagram of a realization 700 of a direct / environment estimator 710, which is based on a stereo environment estimation formula. The direct estimator / environment 710 in Figure 7 can correspond to the direct estimator / environment 110 in Figure 1. In particular, the direct estimator / environment 710 in Figure 7 is configured to apply the stereo environment estimation formula using spatial parametric information 105 for each channel (Chi) of the multi-channel audio signal 101, in which the stereo environment estimation formula can be represented as a functional dependency
] explicitly showing a dependency on the channel level difference (CLDÍ) OR cq parameter and an inter-channel coherence (ICCi) channel parameter Chj ,. As shown in Figure 7, the spatial parametric information 105 is fed to the direct / environment estimator 710 and can comprise the ICCi and cq inter-channel relation parameters for each Chi channel. After applying this formula for estimating stereo environment by using the direct / environment estimator 710, the ratio of direct energy to total (DTTÍ) OR environment to total (ATTi), respectively, will be obtained at its output 715. It should be noted that the formula of the stereo environment estimate above used to estimate the respective proportion of DTT or ATT energy is not based on an equal ambient condition.
In particular, the direct proportion / environment estimate can be performed in which the proportion (DTT) of the direct energy in a channel compared to the total energy of that channel can be formulated by Proportion
where a =
is the inspected channel and R is the linear combination of the rest of the channels. () is the average time. This formula follows when it is assumed that the ambient level is equal in the channel and in the linear combination of the rest of the channels, and the coherence of this must be zero.
Figure 7b shows a graph 750 of an exemplary DTT (direct to total) energy ratio 760 as a function of the ICC 770 inter-channel coherence parameter. In Figure 7b realization, the channel level difference (CLD) or o parameter is exemplarily adjusted to 1 (o = 1), so that the level P (Chi) of the channel Chi and the level P (D) of the linear combination R of the rest of the channels will be equal. In this case, the DTT 760 energy proportion will be linearly proportional to the ICC parameter, as indicated by a straight line 775 marked by DTT ~ ICC. It can be seen in Figure 7b that in the case of ICC = 0, which can correspond to the completely incoherent inter-channel relationship, the energy ratio DTT 7 60 will be 0, which can correspond to a completely ambient situation (case 'R /). However, in the case of ICC = 1, which can correspond to a completely coherent inter-channel relationship, the DTT 760 energy ratio can be 1, which can correspond to a completely direct situation (case 'R2'). Therefore, in the case of Rx, there is essentially no direct energy, while in the case of R2, there is essentially no ambient energy in a channel in relation to the total energy of that channel.
Figure 8 shows a block diagram of an encoder / decoder system 800, according to the additional embodiments of the present invention. On the decoder side of the encoder / decoder system 800, an embodiment of the decoder 820 is shown, which may correspond to the apparatus 100 of Figure 1. Due to the similarity of the achievements of Figure 1 and Figure 8, identical blocks having similar implementations and / or functions in these achievements they are denoted by the same numbers. As shown in the embodiments of Figure 8, the direct / ambient extractor 120 can be operated on a downmix signal 115 having the plurality Chi ... ChM of downmix channels. The direct / environment estimator 110 of Figure 8 can, in addition, be configured to receive at least two downmix channels 825 of the downmix signal 815 (optional), so that the level 113 information of the direct part or the ambient part of the signal Multi-channel audio 101 will be estimated based on the spatial parametric information side 105 on at least two downmix channels 825 received. Finally, the direct signal part 125-1 or the ambient signal part 125-2 will be obtained after extraction by the direct extractor / environment 120.
On the encoder side of the encoder / decoder system 800, an embodiment of an encoder 810 is shown, which may comprise a downmixer 815 for dowmixing the multi-channel audio signal (Chi ... ChN) into the downmix signal 115 having the plurality ChT ... ChM of downmix channels, where the number of channels is reduced from N to M. Downmixer 815 can also be configured to produce spatial parametric information 105 when calculating inter-channel ratios of the multi-channel audio signal 101. In the system encoder / decoder 800 of Figure 8, the downmix signal 115 and spatial parametric information 105 can be transmitted from encoder 810 to decoder 820. Here, encoder 810 can derive an encoded signal based on the downmix signal 115 and spatial parametric information 105 for transmission from the encoder side to the decoder side. In addition, spatial parametric information 105 is based on the channel information of the multi-channel audio signal 101.
On the one hand, the inter-channel relation parameters Oi (Chi, R) θ ICCi (Chi, R) can be calculated between the Chi channel and the linear combination R of the rest of the channels in the 810 encoder and transmitted within the encoded signal. Decoder 820 can, in turn, receive the encoded signal and be operated on the transmitted inter-channel relation parameters cq (Chi, R) and ICCi (Chi, R) •
On the other hand, encoder 810 can also be configured to calculate the inter-channel coherence parameter ICCi, j between pairs of different channels (Chi, Chj) to be transmitted. In this case, the decoder 810 must be able to derive the ICCi parameters (Chi, R) between the Chi θ channel and the linear combination R of the rest of the channels of the ICCi parameters (j (Chi, Chj) calculated in transmitted pairs, so that the corresponding realizations that have been described previously can be carried out. It should be noted in this context that the decoder 820 cannot reconstruct the ICCi (Chif R) parameters from only the knowledge of the downmix signal 115.
In the realizations, the spatial parameters transmitted are not only about pairwise channel comparisons.
For example, the most typical MPS case is that there are two downmix channels here. The first set of spatial parameters in MPS decoding turns the two channels into three: Center, Left and Right. The set of parameters that guides this mapping is called the center forecast coefficient (CPC) and an ICC parameter that is specific to this configuration from two to three.
The second set of spatial parameters divides each into two: the side channels in corresponding front and rear channels, and the central channel in the central and Lfe channels. This mapping is about the ICC and CLD parameters introduced before.
It is not practical to make calculation rules for all types of dowmixing configurations and all types of spatial parameters. However, it is practical to follow the dowmixing steps virtually. As we know how the two channels become three, and the three become six, we will, in the end, find an input-output relationship of how the two audio channels are routed to six outputs. The outputs are only linear combinations of the downmix channels, plus linear combinations of the decorrelated versions of them. It is not necessary to actually decode the output signal and measure it, but as we know this "decoding matrix", we can calculate computationally efficiently the ICC and CLD parameters between any channels or combination of channels in the parametric domain.
Regardless of the downmix and multi-channel signal configuration, each output of the decoded signal is a linear combination of the downmix signals plus a linear combination of a de-correlated version of each.
where the operator D [] corresponds to a decorrelator, that is, a process that makes a duplicate of the input signal incoherent. The factors a and b are known, since they are directly derivable from the parametric parallel information. This is because, by definition, parametric information is guidance to the decoder on how to create the multi-channel output of the downmix signals. The above formula can be simplified to
since all the uncorrelated parts can be combined for the energy / coherence comparison. The energy of D is known, since factors b were also known in the first formula.
From that point on, it should be noted that we can do any kind of coherence and energy comparison between the output channels or between different linear combinations of the output channels. In the case of a simple example of two downmix channels and a set of output channels, of which, for example, channels number 3 and 5 are compared to each other, the sigma is calculated as follows
where E [] is the expectation operator (in practice: Both terms can be formulated as follows

All of the above parameters are known or able to measure from the downmix signals. The cross terms E [Ch_dmx * D] were, by definition, zero and are therefore not in the bottom row of the formula. Similarly, the consistency formula is

Again, since all parts of the above formula are the linear combination of the inputs plus the de-correlated signal, the solution is directly available.
The above examples were with the comparison of two output channels, but similarly one can make a comparison between linear combinations of output channels, as with an exemplary process that will be described later.
In summary of the previous achievements, the presented technique / concept can comprise the following steps: 1. Recover the inter-channel relations (coherence, level) of an "original" set of channels that may be greater than the number of the channel (s) ) downmix. 2. Estimate the ambient and direct energies in this "original" set of channels. 3. Downmix the direct and ambient energies of this "original" set of channels into a smaller number of channels. 4. Use the downmixed energies to extract the direct and ambient signals in the downmix channels provided when applying gain factors or a gain matrix.
The use of spatial parametric parallel information is best explained and summarized by the realization of Figure 2. In the realization of Figure 2, we have a parametric stereo stream, which includes a single audio channel and spatial parallel information on inter-channel differences (coherence, level) of the stereo sound it represents. Now, once we know the inter-channel differences, we can apply the stereo ambient estimation formula above them, and obtain the direct and ambient energies from the original stereo channels. Then, we can "downmix" the channel energies by adding the direct energies together (with coherent sum) and ambient energies (with incoherent sum) and derive the proportions of direct energy to total and ambient to total from the single downmix channel.
With reference to the realization of Figure 2, the spatial parametric information essentially comprises inter-channel coherence parameters (ICCL, ICCR) and channel level difference (CLDL, CLDR) corresponding to the left channel (L) and the right (R) of the signal. parametric stereo audio, respectively. Here, it should be noted that the inter-channel coherence parameters ICCL and ICCR are the same (ICCL = ICCR), while the channel level difference parameters CLDL and CLDR are related by CLDL = - CLDR. Correspondingly, since the channel level difference parameters CLDL and CLDR are typically decibel values of the parameters and <JR, respectively, the parameters (JL and (JR for the left (L) and the right (R) channel are related by (JL = 1 / (JR. These inter-channel difference parameters can be readily used to calculate the respective proportions of direct energy to total (DTTL, DTTR) and environment to total (ATTL, ATTR) for both channels (L, R) based on the stereo environment estimation formula In the stereo environment estimation formula, the proportions of direct energy to total and environment to total (DTTL, ATTL) of the left channel (L) depend on the inter-channel difference parameters (CLDL, ICCL ) for the left channel L, while the proportions of direct energy for total and environment for total (DTTR, ATTR) of the right channel (R) depend on the inter-channel difference parameters (CLDR, ICCR) for the right channel R. Furthermore, the energies (EL, ER) for both the The L, R channels of the parametric stereo audio signal can be derived based on the difference in channel level parameters (CLDL, CLDR) for the left (L) and right (R) channels, respectively. Here, the energy (EL) for the left channel L can be obtained by applying the parameter channel level difference (CLDL) for the left channel L to the mono downmix signal, while the energy (ER) for the right channel R can be obtained by applying the parameter channel level difference (CLDR) for the right channel R to the mono downmix signal. Then, when multiplying the energies (EL, ER) for both channels (E, D) with parameters based on DTTL, DTTR and ATTL, ATTR corresponding, the direct (EDL, EDR) and ambient (EAL, EAR) energies for both channels (E, D) will be obtained. Then, direct energies (EDL, EDR) for both channels (E, D) can be combined / added by using a coherent downmixing standard to obtain downmixed energy (ED, mono) for the direct part of the mono downmix signal, while the ambient energies (EAL, EAR) for both channels (E, D) can be combined / added by using an incoherent dowmixing standard to obtain downmixed energy (EA / mono) for the ambient part of the mono downmix signal. Then, when relating the downmixed energies (ED, mono, EA, mono) for the direct signal part and the ambient signal part to the total energy (Emono) of the mono downmix signal, the ratio of direct to total energy (DTTmono) and environment for total (ATTmono) of the mono downmix signal will be obtained. Finally, based on these proportions of DTTmono and ATTmono energy, the direct signal part or the ambient signal part can essentially be extracted from the mono downmix signal.
In audio playback, there is usually a need to reproduce sound on headphones. Headphone listening has a specific aspect that makes it drastically different from loudspeaker listening and also to any natural sound environment. The audio is adjusted directly to the left and right ears. The audio content produced is typically produced for loudspeaker playback. Therefore, audio signals do not contain the properties and indications that our auditory system uses in spatial sound perception. This is the case, unless binaural processing is introduced into the system.
Binaural processing, fundamentally, can be said to be a process that occurs in the incoming sound and modifies it so that it contains only those interauricular and monaural properties that are perceptually correct (in relation to the way our hearing system processes spatial sound) ). Binaural processing is not a simple task and the existing solutions, according to the prior art, have many sub-idealities. There are a large number of orders in which binaural processing for music and film reproduction is already included, such as multimedia players. and processing devices that are designed to transform multi-channel audio signals into the binaural counterpart for headphones. The typical approach is to use head-related transfer functions (HRTFs) to make virtual speakers and add an ambient effect to the signal. This, in theory, could be equivalent to listening with speakers in a specific environment.
Practice, however, has repeatedly shown that this approach has not consistently satisfied listeners. There seems to be a commitment that good spatialization with this simple method comes at the cost of loss of audio quality, such as having non-preferred changes in the color or timbre of the sound, annoying perception of the ambient effect and loss of dynamics. Additional problems include inaccurate location (eg, head location, frontal-rear confusion), lack of spatial distance from sound sources, and lack of interauricular correspondence, that is, hearing sensation close to the ears due to the wrong interauricular indications.
Different listeners can judge problems very differently. Sensitivity also varies depending on the input material, such as music (strict quality criteria in terms of color of sound), films (less strict) and games (even less strict, but location is important). There are also typically different design goals depending on the content.
Therefore, the following description deals with an approach to overcome the above problems with as much success as possible to maximize the overall average perceived quality.
Figure 9a shows a block diagram of an overview 900 of a binaural direct sound interpretation device 910, in accordance with the additional embodiments of the present invention. As shown in Figure 9a, the binaural direct sound interpretation device 910 is configured to process the direct signal part 125-1, which may be present at the output of the direct extractor / environment 120 in the realization of Figure 1, to obtain a first signal binaural output signal 915. The first binaural output signal 915 may comprise a left channel indicated by E and a right channel indicated by D.
Here, the 910 binaural somerset device can be configured to feed the direct signal part 125-1 via head-related transfer functions (HRTFs) to obtain a transformed direct signal part. The direct-sound device 910 can furthermore be configured to apply ambient effect to the transformed direct signal portion to finally obtain the first binaural output signal 915.
Figure 9b shows a block diagram of details 905 of the binaural direct sound interpretation device 910 of Figure 9a. The binaural 910 direct sound interpretation device may comprise an "HRTF transformer" indicated by block 912 and an ambient effect processing device (reverb or parallel simulation of the previous reflections) indicated by block 914. As shown in Figure 9b, the HRTF transformer 912 and the environmental effect processing device 914 can be operated on the direct signal part 125-1 by applying the head-related transfer functions 10 (HRTFs) and the ambient effect in parallel, so that the first output signal binaural 915 will be obtained.
Specifically, with reference to Figure 9b, this ambient effect processing can also provide an incoherent direct reverberated signal 919, which can be processed by a subsequent 920 cross-mix filter to adapt the signal to the interauricular coherence of diffuse sound fields. Here, the combined output of the filter 920 and the HRTF transformer 912 constitute the first binaural output signal 915. According to the additional realizations, the ambient effect processing in direct sound 20 can also be a parametric representation of previous reflections.
In the realizations, therefore, the environmental effect can preferably be applied in parallel to the HRTFs, and not in series (that is, when applying the environmental effect after feeding signal 25 through the HRTFs). Specifically, only the sound that propagates directly from the source goes through or is transformed by the corresponding HRTFs. The indirect / reverberated sound can be approached to enter the ears all around, that is, in a statistical way (by using coherence control instead of HRTFs). There may also be serial implementations, but the parallel method is preferred.
Figure 10a shows a block diagram of an overview 1000 of a binaural ambient sound interpretation device 1010, according to the additional embodiments of the present invention. As shown in Figure 10a, the binaural ambient sound interpretation device 1010 can be configured to process the ambient signal portion of output 125-2, for example, from the direct / ambient extractor 120 of Figure 1, to obtain the second binaural output 1015. The second binaural output signal 1015 can also comprise a left channel (L) and a right channel (R).
Figure 10b shows a detail block diagram 1005 of the binaural ambient sound interpretation device 1010 of Figure 10a. It can be seen in Figure 10b that the binaural ambient sound interpretation device 1010 can be configured to apply ambient effect, as indicated by block 1012 denoted by "ambient effect processing", to the ambient signal part 125-2, so that an incoherent reverberated ambient signal 1013 will be obtained. The binaural ambient sound interpretation device 1010 can furthermore be configured to process the incoherent reverberated ambient signal 1013 by applying a filter, such as a cross-mixing filter indicated by block 1014, so that the second binaural output signal 1015 will be provided, the second binaural signal 1015 being adapted to the interauricular coherence of real diffuse sound fields. Block 1012 denoted by "ambient effect processing" can also be configured so that it directly produces the interauricular coherence of real diffuse sound fields. In this case, block 1014 is not used.
According to a further embodiment, the binaural ambient sound interpretation device 1010 is configured to apply ambient effect and / or a filter to the ambient signal part 125-2 to provide the second binaural output signal 1015, so that the second binaural output signal 1015 will be adapted to the interauricular coherence of real diffuse sound fields.
In the above achievements, de-correlation and coherence control can be performed in two consecutive steps, but this is not a requirement. It is also possible to achieve the same result with a single step process, without an intermediate formulation of inconsistent signals. Both methods are equally valid.
Figure 11 shows a conceptual block diagram of an embodiment 1100 of binaural reproduction of a multi-channel input audio signal 101. Specifically, the embodiment of Figure 11 represents an apparatus for binaural reproduction of the multi-input audio signal. channels 101, comprising a first converter 1110 ("frequency transformation"), the separator 1120 ("direct-ambient separation"), the binaural direct sound interpretation device 910 ("direct source interpretation"), the interpretation device binaural surround sound 1010 ("ambient sound interpretation"), combiner 1130, as indicated by 'mais'and a second converter 1140 ("inverse frequency transformation"). In particular, the first converter 1110 can be configured to convert the multi-channel input audio signal 101 into a spectral representation 1115. The separator 1120 can be configured to extract the direct signal part 125-1 or the ambient signal part 125-2 of the spectral representation 1115. Here, the separator 1120 can correspond to the apparatus 100 of Figure 1, especially including the direct estimator / environment 110 and the direct extractor / environment 120 of the realization of Figure 1. As explained before, the binaural direct sound interpretation 910 can be operated on the direct signal part 125-1 to obtain the first binaural output signal 915. Correspondingly, the binaural ambient sound interpretation device 1010 can be operated on the ambient signal part 125- 2 to obtain the second binaural output signal 1015. Combiner 1130 can be configured to combine the first binaural output signal 915 and the second signal binaural output 1015 to obtain a combined signal 1135. Finally, the second converter 1140 can be configured to convert the combined signal 1135 into a time domain to obtain a stereo output audio signal 1150 ("stereo output to headphones" ).
The frequency transformation operation shown in Figure 11 illustrates that the system works in a frequency transformation domain, which is a natural domain in the perceptual processing of spatial audio. The system itself does not necessarily have a frequency transformation if it is used as an addition to a system that already works in the frequency transformation domain.
The above direct / ambient separation process can be subdivided into two different parts. In the direct estimation / environment part, the levels and / or proportions of the direct / environment part are estimated based on the combination of a signal model and the properties of the audio signal. In the direct extraction / environment part, the known proportions and the input signal can be used in the creation of the direct output signals in the environment.
Finally, Figure 12 presents a general block diagram of a 1200 realization of the estimate / direct extraction / environment including the use case of binaural reproduction. In particular, embodiment 1200 of Figure 12 may correspond to embodiment 1100 of Figure 11. However, in embodiment 1200, the details of separator 1120 of Figure 11 corresponding to blocks 110, 120 of the embodiment of Figure 1 are presented, which includes the estimation / extraction process based on spatial parametric information 105. In addition, opposite to realization 1100 of Figure 11, there is no conversion process between different domains in realization 1200 of Figure 12. The blocks of realization 1200 are also explicitly operated in downmix signal 115, which can be derived from the multi-channel audio signal 101.
Figure 13a shows a block diagram of an embodiment of an apparatus 1300 for extracting a direct / ambient signal from a mono downmix signal in a filter bank domain. As shown in Figure 13a, apparatus 1300 comprises an analysis filter bank 1310, a synthetic filter bank 1320 for the direct part and a synthetic filter bank 1322 for the ambient part.
In particular, the analysis filter bank 1310 of the apparatus 1300 can be implemented to perform a short time Fourier transform (STFT) or it can, for example, be configured as an analysis QMF filter bank, while the filter banks of synthesis 1320, 1322 of the apparatus 1300 can be implemented to perform a short time inverse Fourier transform (ISTFT) or can, for example, be configured as QMF filter banks if synthesis.
The analysis filter bank 1310 is configured to receive a mono downmix signal 1315, which can correspond to the mono downmix signal 215 as shown in the Figure 2 embodiment, and to convert the mono downmix signal 1315 into a 1311 plurality of bank sub-bands. filter. As can be seen in Figure 13a, the plurality 1311 of filter bank sub-bands is connected to a plurality 1350, 1352 of direct extraction / environment blocks, respectively, in which the plurality 1350, 1352 of direct extraction / environment blocks is configured. to apply parameters based on DTTmono or ATTmono 1333, 1335 to the filter bank sub-ranges, respectively.
The parameters based on DTTmono ATTmono 1333, 1335 can be provided from a DTTmono calculator, ATTmono 1330, as shown in Figure 13b. In particular, the DTTmonor ATTraono 1330 calculator in Figure 13b can be configured to calculate the proportions of DTTmono, ATTmono energy or derive the parameters based on DTTmono, ATTmono from the inter-channel coherence parameters and provided channel level difference (ICCL, CLDL, ICCR, CLDR) 105 corresponding to the left and right channel (E, D) of a parametric stereo audio signal (for example, the parametric stereo audio signal 201 of Figure 2), which have been described correspondingly before. Here, for a single bank filter sub-range, the corresponding parameters 105 and parameters based on DTTmonor ATTmono 1333, 1335 can be used. In this context, it is pointed out that these parameters are not constant over the frequency.
As a result of applying the parameters based on DTTmono or ATTmono 1333, 1335, a plurality 1353, 1355 of modified filter bank sub-ranges will be obtained, respectively. Subsequently, the plurality 1353, 1355 of modified filter bank sub-bands is fed into the synthetic filter banks 1320, 1322, respectively, which are configured to synthesize the plurality 1353, 1355 of modified filter bank sub-bands in order to obtain the direct signal part 1325-1 or ambient signal part 1325-2 of mono downmix signal 1315, respectively. Here, the direct signal part 1325-1 of Figure 13a can correspond to the direct signal part 125-1 of Figure 2, while the ambient signal part 1325-2 of Figure 13a can correspond to the ambient signal part 125-2 of Figure 2.
With reference to Figure 13b, a direct extraction / environment block 1380 of the plurality 1350, 1352 of direct extraction / environment blocks of Figure 13a especially comprises the DTTmono calculator, ATTmono 1330 and a 1360 multiplier. Multiplier 1360 can be configured to multiply a single filter bank sub-band (FB) 1301 of the plurality of filter bank sub-bands 1311 with the parameter based on DTTmono / corresponding ATT mono 1333, 1335, so that a single modified filter bank sub-band 1365 of the plurality of sub-bands of filter bank 1353, 1355 will be obtained. In particular, the direct extraction / environment block 1380 is configured to apply the parameter based on DTTmono, in this case block 1380 belongs to the plurality 1350 of blocks, while it is configured to apply the parameter based on ATTmono, in this case the block 1380 belongs to the 1352 plurality of blocks. The only modified filter bank sub-range 1365 can, in addition, be supplied to the respective synthetic filter bank 1320, 1322 for the direct part or the ambient part.
According to the achievements, the spatial parameters and the derived parameters are given in a frequency resolution, according to the critical ranges of the human auditory system, for example, 28 ranges, which is usually less than the resolution of the filter bank.
Therefore, direct extraction / environment, according to the realization of Figure 13a, operates essentially in different sub-bands in a filter bank domain based on the inter-channel coherence parameters and difference in channel level calculated by sub-band, which may correspond to the inter-channel relationship parameters 335 of Figure 3b.
Figure 14 shows a schematic illustration of an exemplary MPEG Surround 1400 decoding scheme, in accordance with the further embodiment of the present invention. In particular, the realization of Figure 14 describes a decoding of a stereo downmix 1410 to six output channels 1420. Here, the signals denoted by "res" are residual signals, which are optional substitutions for decorrelated signals (from blocks denoted by "D "). According to the realization of Figure 14, the spatial parametric information or inter-channel relation parameters (ICC, CLD) transmitted within an MPS current of an encoder, such as encoder 810 of Figure 8 to a decoder, such as decoder 820 of Figure 8, can be used to generate decoding matrices 1430, 1440 denoted by "pre-de-correlating matrix M1" and "mixing matrix M2", respectively. I specify to the realization of Figure 14 that the generation of the output channels 1420 (that is, upmix channels E, ES, D, DS, C, LFE) of the side channels (E, D) and the central channel (C) (E, D, C 1435) when using the M2 1440 mixing matrix, it is essentially determined by the spatial parametric information 1405, which can correspond to the spatial parametric information 105 of Figure 1, comprising particular inter-channel relation parameters (ICC, CLD), according to o MPS Surround Standard.
Here, a division of the left channel (L) in the corresponding output channels E, ES, the right channel (R) in the corresponding output channels D, DS and the central channel (C) in the corresponding output channels C, LFE, respectively , can be represented by the configuration of one to two (OTT) having a respective entry for the corresponding ICC, CLD parameters.
The exemplary MPEG Surround 1400 decoding scheme that specifically corresponds to a "5-2-5 configuration" can, for example, comprise the following steps. In a first step, spatial parameters or parametric parallel information can be formulated in decoding matrices 1430, 1440, which are shown in Figure 14, according to the existing MPS Surround Standard. In a second step, the decoding matrices 1430, 1440 can be used in the parameter domain to provide inter-channel information from the upmix 1420 channels. In a third step, with the inter-channel information thus provided, the direct / ambient energies of each upmix channel can be used. be calculated. In a fourth step, the direct / ambient energies obtained can be downmixed to the number of downmix channels 1410. In a fifth step, the weights that will be applied to the downmix channels 1410 can be calculated.
Before going any further, it should be noted that the exemplary process mentioned now requires the measure of
which are, then, average powers of the downmix channels, and
that can be mentioned with the cross spectrum, from the downmix channels. Here, the average powers of the downmix channels are purposely referred to as energies, since the term "average power" is not one of those common terms to be used.
The expectation operator indicated by square brackets can be replaced in practical applications for an average of time, recursive or non-recursive. The energies and the cross spectrum are able to measure the downmix signal in a simple way.
It should also be noted that the energy of a linear combination of two channels can be formulated from the channel energies, the mixing factors and the cross spectrum (all in the parametric domain, where signal operations are not required). The linear combination Ch = aLdmx + bRdmx has the following energy:

Next, the individual steps of the exemplary process (ie, decoding scheme) are described. FIRST STAGE (SPATIAL PARAMETERS TO MIXING MATRIXS)
As previously described, matrices M1 and M2 are created, according to the MPS Surround standard. Row a: th - the column element b: th of Ml is Ml (a, b). SECOND STEP (POWER MIXING MATRIXS AND CROSSED DOWNMIX SPECTRUMS FOR INTERCANAL INFORMATION OF UPMIXED CHANNELS)
Now, we have the mixing matrices Ml and M2. We need to formulate how the output channels are created from the left downmix channel (Ldmx) and the right downmix channel (Rdmx). We assume that de-correlators are used (Figure 14, gray area). Decoding / upmixing in the MPS standard basically provides the complete process at the end of the following formula for the general input-output ratio:

The aforementioned is exemplary for the upmixed front left channel. The other channels can be formulated in the same way. The D elements are the decorrelators, a-e are weights that are calculable from the matrix entries M1 and M2.
In particular, factors a and are formulable simply from the matrix entries:
and for other channels in the same way. The signs are = MIH + 3,] Ldmx + Mln + 3_2Rdmx
These signals are the inputs to the decouplers of the matrix on the left in Figure 14. The energy can be calculated, as explained above. The de-correlator does not affect energy.

A perceptually motivated way to extract multiple channels from the environment is to compare one channel against the sum of all other channels. (Note that this is an option of many). Now, if we consider the L channel example, the rest of the channels read:

We use the "X" symbol here, because using "R" for "rest of the channels" could be confusing. So, the energy of the L channel is
So the energy from channel X is
And the cross spectrum is:
Now, we can formulate the ICC
THIRD STEP (INTERCANAL NOSCANAL INFORMATION UPMIXED TO DTT PARAMETERS OF UPMIXED CHANNELS)
Now, we can calculate the L-channel DTT, according to
Lé E's direct energy
Lé's ambient energy
4 ^ i2] = (1 "pπ> £ [ii2] FOURTH STEP (DIRECT / ENVIRONMENTAL DOWMIXING)
If, for example, the use of an incoherent dowmixing standard, the ambient energy of the left downmix channel is
and similarly for the direct part and the ambient part of the right channel. Note that the above is only a dowmixing standard. There may be other dowmixing rules as well. FIFTH STEP (CALCULATION OF WEIGHTS FOR ENVIRONMENTAL EXTRACTION IN DOWNMIX CHANNELS)
The proportion of left downmix DTT

Weighting factors can then be calculated as described in Figure 5 (that is, using the square root (DTT) or square root (1- DTT) approach) or as in Figure 6 (that is, when using a cross-mixing matrix method).
Basically, the exemplary process described above refers to the CPC, ICC and CLD parameters in the MPS stream for the ambient proportions of the downmix channels.
According to the additional achievements, there are typically other means to achieve similar goals and other conditions as well. For example, there may be other standards for dowmixing, other speaker layouts, other decoding methods, and other ways of estimating multiple channels than described above, where a specific channel is compared to the remaining channels.
Although the present invention has been described in the context of block diagrams, where the blocks represent the real or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding method steps *, where these steps represent the functionalities performed by logical or physical hardware blocks.
The described embodiments are merely illustrative for the principles of the present invention. It is understood that modifications and variations of the provisions and details described herein will be apparent to those skilled in the art. It is intended, therefore, to be limited only by the scope of the attached patent claims and not by the specific details presented by way of description and explanation of the achievements here.
Depending on certain implementation requirements for the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, in particular, a disc, a DVD or a CD having electronically readable control signals on them, which cooperate with programmable computer systems, so that the inventive methods are carried out. In general, the present invention can therefore be implemented as a computer program product with the program code stored in a machine-readable loader, the program code being operated to perform the inventive methods when the product programs computer runs on a computer. In other words, the inventive methods are, therefore, a computer program having a program code to perform at least one of the inventive methods when the computer program runs on a computer. The inventive encoded audio signal can be stored in any machine-readable storage medium, such as a digital storage medium.
An advantage of the innovative concept and technique is that the achievements mentioned above, that is, the apparatus, method or computer program, described in this application allows estimating and extracting the direct components and / or environments of an audio signal with the aid of information parametric spaces. In particular, the innovative processing of the present invention works in the frequency bands, as typically in the fields of ambient extraction. The concept presented is relevant to audio signal processing, since there are several applications that need direct and environmental component separation of an audio signal.
Opposite to the prior art ambient extraction methods, the present concept is not based on stereo input signals only and can also apply to mono downmix situations. For a single downmix channel, in general, inter-channel differences cannot be computed. However, when considering spatial parallel information, ambient extraction is also possible in this case.
The present invention is advantageous in that it uses spatial parameters to estimate the ambient levels of the "original" signal. It is based on the concept that the spatial parameters already contain information about the inter-channel differences of the "original" stereo or multi-channel signal.
Once the original stereo or multi-channel ambient levels are estimated, you can also derive the direct and ambient levels in the downmix channel (s). This can be done by linear combinations (ie, weighted sum) of the ambient energies for the ambient part, and direct energies or amplitudes for the direct part. Therefore, the achievements of the present invention provide estimation and extraction with the aid of spatial parallel information. Extending from this concept of processing based on parallel information, the following beneficial properties or advantages exist.
The achievements of the present invention provide ambient estimation with the aid of spatial parallel information and the provided downmix channels. This environmental estimate is important in cases where there is more than one downmix channel provided with the parallel information. Parallel information and information that is measured from downmix channels, can be used together with the environmental estimate. In MPEG surround with a stereo downmix, these two sources of information together provide complete information on the inter-channel relationships of the original multi-channel sound and the ambient estimate is based on those relationships.
The achievements of the present invention also provide direct and ambient energy mixing. In the described situation of ambient extraction based on parallel information, there is an intermediate stage of environmental estimation in a number of channels greater than the downmix channels provided. Therefore, this environment information must be mapped to the number of downmix audio channels in a valid manner. This process can be referred to as dowmixing due to its correspondence to audio channel dowmixing. This can be done in a simpler way by combining direct energy and environment in the same way that the provided downmix channels were downmixed.
The dowmixing standard does not have an ideal solution, but it is probably dependent on the application. For example, in MPEG surround, it can be beneficial to treat channels differently (center, front speakers, rear speakers) due to their typically different signal content.
In addition, the achievements provide an environment estimate of multiple channels independently on each channel in relation to the other channels. This property / approach allows you to simply use the stereo environment estimation formula presented for each channel in relation to all other channels. By this measure, it is not necessary to assume an equal level in all channels. The approach presented is based on the assumption about the spatial perception that the environment component in each channel is that the component that has an incoherent counterpart in some of all other channels. An example that suggests the validity of this assumption is that one of the two channels that emit noise (environment) can be divided into other channels with half energy each, without significantly affecting the perceived sound scenario.
In terms of signal processing, it is advantageous that the direct proportion / real environment estimate happens when applying the environment estimation formula presented for each channel versus the linear combination of all other channels.
Finally, the achievements provide an application of estimated direct ambient energies to extract the real signals. Once the ambient levels in the downmix channels are known, two inventive methods can be applied to obtain the ambient signals. The first method is based on simple multiplication, in which the direct and ambient parts for each downmix channel can be generated by multiplying the signal with the square root (ratio of direct energy to total) and square root (ratio of ambient energy to total ). This provides for each downmix channel two signals that are coherent with each other, but have the energies that the direct and ambient parts were estimated to have.
The second method is based on a solution by the mean of least squares with cross-channel mixing, in which cross-channel mixing (also possible with negative signs) allows a better estimate of direct ambient signals than in the above solution. In contrast to a minimum average solution for stereo input environment levels and the same in the channels provided in "Multiple-loudspeaker playback of stereo signals", C. Faller, Journal of the AES, Oct. 2007 and "Patent application title: Method to Generate Multi-Channel Audio Signal from Stereo Signals ", Inventors: Christof Faller, Agents: FISH & RICHARDSON PC, Assignees: LG ELECTRONICS, INC., Origin: MINNEAPOLIS, MN US, IPC8 Class: AH04R500FI, USPC Class: 381 1, this The invention provides a solution by the mean of least squares that does not need equal ambient levels and is also capable of extending to any number of channels.
The additional properties of innovative processing are as follows. In the ambient processing for binaural interpretation, the environment can be processed with a filter that has the properties of providing interauricular coherence in the frequency bands that are similar to the interauricular coherence in the real diffuse sound fields, in which the filter can also include ambient effect. In the processing of the direct part for binaural interpretation, the direct part can be fed through the transfer functions related to the head (HRTFs) with possible addition of ambient effect, such as reflections and / or previous reverberation.
In addition, a corresponding "level separation" control for dry / wet control can be performed in the additional designs. In particular, complete separation may not be desirable in many applications as this can lead to audible artifacts, such as abrupt changes, modulation effects, etc. Therefore, all relevant parts of the described processes can be implemented with a "level separation" control to control the desired and useful amount of separation. With reference to Figure 11, this level separation control is indicated by a control input 1105 of a dashed box to control direct separation / environment 1120 and / or the binaural interpretation devices 910, 1010, respectively. This control can work similar to a dry / wet control when processing audio effects.
The main benefits of the presented solution are as follows. The system works in all situations, also with parametric stereo and MPEG surround with mono downmix, previous unlikely solutions that depend only on downmix information. The system is also capable of using parallel spatial information transmitted along with the audio signal in the spatial audio bit streams to more accurately estimate direct and ambient energies than with simple inter-channel analysis of downmix channels. Therefore, many applications, such as binaural processing, can benefit from applying different processing to direct and ambient parts of the sound.
Achievements are based on the following psychoacoustic assumptions. Human hearing systems locate sources based on interauricular indications in time and frequency separations (areas restricted to a certain variation in frequency and time). If two or more inconsistent concomitant sources that overlap in time and frequency are presented simultaneously in different locations, the auditory system is unable to perceive the location of the sources. This is due to the sum of these sources not producing reliable interauricular indications in the listener. My auditory system thus described, in order to take from the audio scene close to the time and frequency separations, which provides reliable location information and deals with the rest of the non-locatable ones. By these means the auditory system is able to locate sources in complex sound environments. The simultaneous coherent sources have a different effect, they form approximately the same interauricular indications that a single source among the coherent sources would form.
This is also the property that achievements take advantage of. The level of localizable (direct) and non-localizable (ambient) sound can be estimated and these components will then be extracted. Spatialization of signal processing is applied only to the localizable / direct part, while diffusion / space / envelope processing is applied to the non-localizable / ambient part. This provides a significant benefit in the design of a binaural processing system, since many processes can be applied only where they are needed, leaving the remaining signal unaffected. All processing takes place in frequency bands that approach the resolution of human auditory frequency.
The achievements are based on a dosage decomposition to maximize the perceptual quality, but to minimize the perceived problems. By this decomposition, it is possible to obtain the direct component and the environment of an audio signal separately. The two components can then be further processed to achieve a desired effect or representation.
Specifically, the embodiments of the present invention allow for environmental estimation with the aid of parallel spatial information in the coded domain.
The present invention is also advantageous in that the typical headphone reproduction problems of audio signals can be reduced by separating the signals into a direct signal and an environment. The achievements make it possible to improve the existing direct / ambient extraction methods to be applied to binaural sound interpretation for the reproduction of earphones.
The main use case for processing based on parallel spatial information is of course MPEG surround and parametric stereo (and similar parametric encoding techniques). Typical applications that benefit from ambient extraction are binaural reproduction due to the ability to apply a different measure of the ambient effect to different parts of the sound, and upmixing to a greater number of channels due to the ability to position and process different components the sound differently. There may also be applications in which the user would need to modify the direct level / environment, for example, in order to enhance speech intelligibility.

权利要求:
Claims (14)
[0001]
1. APPLIANCE (100) TO EXTRACT A DIRECT AND / OR ENVIRONMENTAL SIGN (125-1, 125-2) FROM A DOWNMIX SIGN (115) SPATIAL PARAMETRIC INFORMATION (105), the downmix signal (115) and the spatial parametric information ( 105) representing a multi-channel audio signal (101) having more channels (Ch1 ... ChN) than the downmix signal (115), in which spatial parametric information (105) is characterized by understanding inter-channel relationships of the audio signal multi-channel (101), the apparatus (100) comprising: a direct / ambient estimator (110) to estimate direct level information (113) from a direct part of the multi-channel audio signal (101) and / or to estimating ambient level information (113) of an ambient portion of the multi-channel audio signal (101) based on spatial parametric information (105); and a direct / ambient extractor (120) to extract a direct signal portion (125-1) and / or an ambient signal portion (125-2) from the downmix signal (115) based on the estimated direct level information (113 ) from the direct part or based on the estimated environmental level information (113) from the environment part; where the direct / ambient extractor is configured to mix the estimated direct level information of the direct part or the estimated environment level information of the ambient part to acquire mixed level information from the direct part or the ambient part and extract the signal portion direct or part of the ambient signal from the downmix signal based on the downmix level information; where the direct / ambient estimator is configured to estimate the direct level information of the direct part of the multichannel audio signal or to estimate the ambient level information of the ambient portion of the multichannel audio signal based on spatial and at least parametric information two downmix channels of the downmix signal received by the direct / environment estimator.
[0002]
2. APPLIANCE according to claim 1, characterized in that the direct / environment extractor (420) is, in addition, configured to perform a downmix of the estimated direct level information (113) of the direct part or the estimated environment level information (113) of the environment part by combining the estimated direct level information (113) of the direct part with the coherent sum and the estimated environment level information (113) of the environment part with incoherent sum.
[0003]
3. APPLIANCE according to claim 1, characterized in that the direct / ambient extractor (520) is furthermore configured to derive gain parameters (565-1, 5652) from the downmixed level information (555-1, 555 -2) from the direct or ambient part and apply the derived gain parameters (565-1, 565-2) to the downmix signal (115) to obtain the direct signal part (125-1) or the ambient signal part (1252 ).
[0004]
4. Apparatus according to claim 3, characterized in that the direct / ambient extractor (520) is furthermore configured to determine a direct energy to total (DTT) or ambient to total (ATT) ratio of the level information downmixed (555-1, 555-2) from the direct or ambient part and use the gain parameters (565-1, 5652) extraction parameters based on the determined DTT or ATT energy proportion.
[0005]
Apparatus according to claim 1, characterized in that the direct / ambient extractor (520) is configured to extract the direct signal part (125-1) or the ambient signal part (125-2) when applying a matrix of M by M quadratic extraction to the downmix signal (115), where one size (M) of the M by M quadratic extraction matrix corresponds to several (M) downmix channels (Ch1 ... ChM).
[0006]
Apparatus according to claim 5, characterized in that the direct / ambient extractor (520) is furthermore configured to apply a first plurality of extraction parameters to the downmix signal (115) to obtain the direct signal part ( 125-1) and a second plurality of extraction parameters to the downmix signal (115) to obtain the ambient signal portion (125-2), the first and second plurality of extraction parameters constituting a diagonal matrix.
[0007]
7. APPLIANCE according to claim 1, characterized in that the direct / ambient estimator (110) is configured to estimate the direct level information (113) of the direct part of the multi-channel audio signal (101) or to estimate the ambient level information (113) of the ambient part of the multi-channel audio signal (101) based on spatial parametric information (105) and at least two downmix channels (825) of the downmix signal (115) received by the direct / ambient estimator (110).
[0008]
8. APPLIANCE, according to claim 1, characterized by direct / ambient estimator (710) is configured to apply a stereo ambient estimation formula using spatial parametric information (105) for each channel (Chi) of the multiple audio signal channels (101), where the stereo environment estimation formula is given by DTT = fen- (Ch, R), ICC, (Ch ,, R)], ATT = 1 - DTT depending on a difference in channel level (CLDi), which is a decibel value of ai, and an inter-channel coherence parameter (ICCi) of the Chi channel, and where R is a linear combination of the remaining channels.
[0009]
9. Apparatus according to claim 1, characterized in that the direct / ambient extractor (620) is configured to extract the direct signal part (125-1) or the ambient signal part (125-2) by a solution by least squares mean (LMS) with cross channel mixing, the LMS solution does not need equal ambient levels.
[0010]
10. APPLIANCE, according to claim 8, characterized in that the direct / ambient extractor (620) is configured to derive the LMS solution when assuming a signal model, so that the LMS solution is not restricted to a downmix signal of stereo channel.
[0011]
11. Apparatus according to claim 1, the apparatus is further characterized by comprising: a binaural direct sound interpretation device (910) for processing the direct signal part (125-1) to obtain a first binaural output signal (915); a binaural ambient sound interpretation device (1010) for processing the ambient signal portion (125-2) to obtain a second binaural output signal (1015); and a combiner (1130) to combine the first (915) and the second (1015) binaural output signal to obtain a combined binaural output signal (1135).
[0012]
Apparatus according to claim 11, characterized in that the binaural ambient sound interpretation device (1010) is configured to apply ambient effect and / or a filter to the ambient signal part (125-2) to provide the second signal binaural output signal (1015), the second binaural output signal (1015) being adapted for interauricular coherence of the actual diffuse sound fields.
[0013]
13. Apparatus according to claim 11 or 13, characterized in that the binaural direct sound interpretation device (910) is configured to feed the direct signal part (125-1) through the filters based on the related transfer functions to head (HRTFs) to obtain the first binaural output signal (915).
[0014]
14. METHOD (100) FOR EXTRACTING A DIRECT AND / OR ENVIRONMENTAL SIGN (125-1, 125-2) FROM A DOWNMIX SIGN (115) SPATIAL PARAMETRIC INFORMATION (105), the downmix signal (115) and spatial parametric information ( 105) representing a multi-channel audio signal (101) having more channels (Ch1 ... ChN) than the downmix signal (115), in which spatial parametric information (105) is characterized by understanding inter-channel relationships of the audio signal multi-channel (101), the method (100) comprising: estimating (110) direct level information (113) from a direct part of the multi-channel audio signal (101) and / or estimating (110) information from ambient level (113) of an ambient portion of the multi-channel audio signal (101) based on spatial parametric information (105); and extracting (120) a direct signal part (125-1) and / or an ambient signal part (125-2) from the downmix signal (115) based on the estimated direct level information (113) from the direct part or with based on the estimated environmental level information (113) of the environment part; wherein the extraction comprises mixing estimated direct level information from the direct part or estimated environmental level information from the ambient part to acquire reduced mixing level information from the direct part or the environmental part and extracting the direct signal part or the ambient signal portion of the downmix signal based on the downmixed level information; wherein the estimate comprises estimating the direct level information of the direct portion of the multichannel audio signal or estimating the ambient level information of the ambient portion of the multichannel audio signal based on spatial parametric information and at least two downmix channels of the signal downmix.

类似技术:

公开号 | 公开日 | 专利标题

BR112012017551B1|2020-12-15|APPARATUS AND METHOD TO EXTRACT A DIRECT / ENVIRONMENT SIGN FROM A DOWNMIX SIGN AND SPACE PARAMETRIC INFORMATION

US20200335115A1|2020-10-22|Audio encoding and decoding

US9552819B2|2017-01-24|Multiplet-based matrix mixing for high-channel count multichannel audio

RU2409911C2|2011-01-20|Decoding binaural audio signals

PT2372701E|2014-03-20|Enhanced coding and parameter representation of multichannel downmixed object coding

BR122018069726B1|2019-03-19|EQUIPMENT AND METHOD FOR PROCESSING A MULTI-CHANNEL AUDIO SIGNAL, EQUIPMENT FOR INVERT PROCESSING OF INPUT DATA AND INVERSE PROCESSING METHOD

BR112015000247B1|2021-08-03|DECODER, DECODING METHOD, ENCODER, ENCODING METHOD, AND ENCODING AND DECODING SYSTEM.

KR20140140102A|2014-12-08|Multi-channel audio encoder and method for encoding a multi-channel audio signal

Breebaart et al.2008|Binaural rendering in MPEG Surround

WO2017035163A9|2017-05-18|Audio decoder and decoding method

He2017|Literature review on spatial audio

MX2008011994A|2008-11-27|Generation of spatial downmixes from parametric representations of multi channel signals.

BRPI0715559B1|2021-12-07|IMPROVED ENCODING AND REPRESENTATION OF MULTI-CHANNEL DOWNMIX DOWNMIX OBJECT ENCODING PARAMETERS

同族专利:

公开号 | 公开日

AR079998A1|2012-03-07|

EP2524370A1|2012-11-21|

AU2011206670A1|2012-08-09|

KR101491890B1|2015-02-09|

RU2012136027A|2014-02-20|

US20120314876A1|2012-12-13|

TW201142825A|2011-12-01|

AU2011206670B2|2014-01-23|

CN102804264B|2016-03-09|

ES2587196T3|2016-10-21|

TWI459376B|2014-11-01|

JP5820820B2|2015-11-24|

KR20120109627A|2012-10-08|

RU2568926C2|2015-11-20|

CA2786943C|2017-11-07|

JP2013517518A|2013-05-16|

EP2524370B1|2016-07-27|

CN102804264A|2012-11-28|

MX2012008119A|2012-10-09|

CA2786943A1|2011-07-21|

EP2360681A1|2011-08-24|

US9093063B2|2015-07-28|

WO2011086060A1|2011-07-21|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

IL129752A|1999-05-04|2003-01-12|Eci Telecom Ltd|Telecommunication method and system for using same|

CN1144224C|2000-02-14|2004-03-31|王幼庚|Method for generating space sound signals by recording sound waves before ear|

US7567845B1|2002-06-04|2009-07-28|Creative Technology Ltd|Ambience generation for stereo signals|

SE0400997D0|2004-04-16|2004-04-16|Cooding Technologies Sweden Ab|Efficient coding or multi-channel audio|

SE0402652D0|2004-11-02|2004-11-02|Coding Tech Ab|Methods for improved performance of prediction based multi-channel reconstruction|

EP1761110A1|2005-09-02|2007-03-07|Ecole Polytechnique Fédérale de Lausanne|Method to generate multi-channel audio signals from stereo signals|

RU2393646C1|2006-03-28|2010-06-27|Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.|Improved method for signal generation in restoration of multichannel audio|

US8103005B2|2008-02-04|2012-01-24|Creative Technology Ltd|Primary-ambient decomposition of stereo audio signals using a complex similarity index|

AU2008365129B2|2008-12-11|2013-09-12|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus for generating a multi-channel audio signal|US9536529B2|2010-01-06|2017-01-03|Lg Electronics Inc.|Apparatus for processing an audio signal and method thereof|

US9253574B2|2011-09-13|2016-02-02|Dts, Inc.|Direct-diffuse decomposition|

WO2013064957A1|2011-11-01|2013-05-10|Koninklijke Philips Electronics N.V.|Audio object encoding and decoding|

CN104704558A|2012-09-14|2015-06-10|杜比实验室特许公司|Multi-channel audio content analysis based upmix detection|

JP6046274B2|2013-02-14|2016-12-14|ドルビーラボラトリーズライセンシングコーポレイション|Method for controlling inter-channel coherence of an up-mixed audio signal|

WO2014126688A1|2013-02-14|2014-08-21|Dolby Laboratories Licensing Corporation|Methods for audio signal transient detection and decorrelation control|

US9549276B2|2013-03-29|2017-01-17|Samsung Electronics Co., Ltd.|Audio apparatus and audio providing method thereof|

EP2804176A1|2013-05-13|2014-11-19|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio object separation from mixture signal using object-specific time/frequency resolutions|

CN104240711B|2013-06-18|2019-10-11|杜比实验室特许公司|For generating the mthods, systems and devices of adaptive audio content|

EP2830053A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a residual-signal-based adjustment of a contribution of a decorrelated signal|

US9319819B2|2013-07-25|2016-04-19|Etri|Binaural rendering method and apparatus for decoding multi channel audio|

US10141004B2|2013-08-28|2018-11-27|Dolby Laboratories Licensing Corporation|Hybrid waveform-coded and parametric-coded speech enhancement|

UA117258C2|2013-10-21|2018-07-10|Долбі Інтернешнл Аб|Decorrelator structure for parametric reconstruction of audio signals|

EP2866227A1|2013-10-22|2015-04-29|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Method for decoding and encoding a downmix matrix, method for presenting audio content, encoder and decoder for a downmix matrix, audio encoder and audio decoder|

US9933989B2|2013-10-31|2018-04-03|Dolby Laboratories Licensing Corporation|Binaural rendering for headphones using metadata processing|

CN103700372B|2013-12-30|2016-10-05|北京大学|A kind of parameter stereo coding based on orthogonal decorrelation technique, coding/decoding method|

EP2892250A1|2014-01-07|2015-07-08|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating a plurality of audio channels|

BR112017008015A2|2014-10-31|2017-12-19|Dolby Int Ab|audio decoding and coding methods and systems, and computer program product|

CA2979598C|2015-03-27|2020-08-18|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus and method for processing stereo signals for reproduction in cars to achieve individual three-dimensional sound by frontal loudspeakers|

EA202090186A3|2015-10-09|2020-12-30|Долби Интернешнл Аб|AUDIO ENCODING AND DECODING USING REPRESENTATION CONVERSION PARAMETERS|

CN105405445B|2015-12-10|2019-03-22|北京大学|A kind of parameter stereo coding, coding/decoding method based on transmission function between sound channel|

KR102357287B1|2016-03-15|2022-02-08|프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝에. 베.|Apparatus, Method or Computer Program for Generating a Sound Field Description|

JP6846822B2|2016-04-27|2021-03-24|国立大学法人富山大学|Audio signal processor, audio signal processing method, and audio signal processing program|

US9913061B1|2016-08-29|2018-03-06|The Directv Group, Inc.|Methods and systems for rendering binaural audio content|

US10187740B2|2016-09-23|2019-01-22|Apple Inc.|Producing headphone driver signals in a digital audio signal processing binaural rendering environment|

US10306391B1|2017-12-18|2019-05-28|Apple Inc.|Stereophonic to monophonic down-mixing|

WO2020009350A1|2018-07-02|2020-01-09|엘지전자 주식회사|Method and apparatus for transmitting or receiving audio data associated with occlusion effect|

WO2020008112A1|2018-07-03|2020-01-09|Nokia Technologies Oy|Energy-ratio signalling and synthesis|

EP3618464A1|2018-08-30|2020-03-04|Nokia Technologies Oy|Reproduction of parametric spatial audio using a soundbar|

CN109036455B|2018-09-17|2020-11-06|中科上声（苏州）电子有限公司|Direct sound and background sound extraction method, loudspeaker system and sound reproduction method thereof|

GB2578603A|2018-10-31|2020-05-20|Nokia Technologies Oy|Determination of spatial audio parameter encoding and associated decoding|

WO2020231883A1|2019-05-15|2020-11-19|Ocelot Laboratories Llc|Separating and rendering voice and ambience signals|

法律状态:
2019-01-08| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2020-01-28| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-05-26| B06A| Patent application procedure suspended [chapter 6.1 patent gazette]|

2020-09-15| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2020-12-15| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 11/01/2011, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US29527810P| true| 2010-01-15|2010-01-15|

US61/295,278|2010-01-15|

EP10174230.2|2010-08-26|

EP10174230A|EP2360681A1|2010-01-15|2010-08-26|Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information|

PCT/EP2011/050265|WO2011086060A1|2010-01-15|2011-01-11|Apparatus and method for extracting a direct/ambience signal from a downmix signal and spatial parametric information|

[返回顶部]